Automatic Cluster Stopping with Criterion Functions and the Gap Statistic
نویسندگان
چکیده
SenseClusters is a freely available system that clusters similar contexts. It can be applied to a wide range of problems, although here we focus on word sense and name discrimination. It supports several different measures for automatically determining the number of clusters in which a collection of contexts should be grouped. These can be used to discover the number of senses in which a word is used in a large corpus of text, or the number of entities that share the same name. There are three measures based on clustering criterion functions, and another on the Gap Statistic.
منابع مشابه
Comparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کاملAn Introduction to a New Criterion Proposed for Stopping GA Optimization Process of a Laminated Composite Plate
Several traditional stopping criteria in Genetic Algorithms (GAs) are applied to the optimization process of a typical laminated composite plate. The results show that neither of the criteria of the type of statistical parameters, nor those of the kinds of theoretical models performs satisfactorily in determining the interruption point for the GA process. Here, considering the configuration of ...
متن کاملCluster Stopping Rules For Word Sense Discrimination
As text data becomes plentiful, unsupervised methods for Word Sense Disambiguation (WSD) become more viable. A problem encountered in applying WSD methods is finding the exact number of senses an ambiguity has in a training corpus collected in an automated manner. That number is not known a priori; rather it needs to be determined based on the data itself. We address that problem using cluster ...
متن کاملUnsupervised Domain Adaptation for I-vector Speaker Recognition
In this paper, we present a framework for unsupervised domain adaptation of PLDA based i-vector speaker recognition systems. Given an existing out-of-domain PLDA system, we use it to cluster unlabeled in-domain data, and then use this data to adapt the parameters of the PLDA system. We explore two versions of agglomerative hierarchical clustering that use the PLDA system. We also study two auto...
متن کاملAutomatic concept identification in goal-oriented conversations
We address the problem of identifying key domain concepts automatically from an unannotated corpus of goal-oriented human-human conversations. We examine two clustering algorithms, one based on mutual information and another one based on Kullback-Liebler distance. In order to compare the results from both techniques quantitatively, we evaluate the outcome clusters against reference concept labe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006